How to create ceph cluster using ROOK in kubernetes
Running a stateful application in POD is not an easy task. I worked on ceph for brief period of time. I thought of using an existing ceph setup to provide persistent storage for kubernetes. For this exercise, created single node ceph cluster. I was facing some issues in integrating existing ceph setup with minikube kubernetes. While I was searching for this issue, I came across this interesting project ROOK which is used to create ceph cluster on top of kubernetes. I really like that idea, in case of on-premise kubernetes deployment where servers are having lot of disks, and these disks can be used to create a distributed ceph cluster using ROOK. It’s not only limited to ceph only other supported storage types are “CockroachDB” and “minio” as per project github page. Okay enought talking, let’s start with some practical work.
ROOK operator can be installed using helm charts or using the examples given in official link. I have used example operator.yaml for the installation, it assumed that RBAC is enabled.
In this article, I am going to use the examples provided in official ROOK github repository.
- First create a ceph operator role.
$ kubectl create -f ceph-operator.yml
namespace "rook-ceph-system" created
customresourcedefinition "clusters.ceph.rook.io" created
customresourcedefinition "filesystems.ceph.rook.io" created
customresourcedefinition "objectstores.ceph.rook.io" created
customresourcedefinition "pools.ceph.rook.io" created
customresourcedefinition "volumes.rook.io" created
clusterrole "rook-ceph-cluster-mgmt" created
role "rook-ceph-system" created
clusterrole "rook-ceph-global" created
serviceaccount "rook-ceph-system" created
rolebinding "rook-ceph-system" created
clusterrolebinding "rook-ceph-global" created
deployment "rook-ceph-operator" created
- List all the resources created in rook-ceph-system namespace. It has two created two daemonsets and one deployment.
$ kubectl get all -n rook-ceph-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/rook-ceph-agent 1 1 1 1 1 <none> 1m
ds/rook-discover 1 1 1 1 1 <none> 1m
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/rook-ceph-operator 1 1 1 1 2m
NAME DESIRED CURRENT READY AGE
rs/rook-ceph-operator-86776bbc44 1 1 1 2m
NAME READY STATUS RESTARTS AGE
po/rook-ceph-agent-wq7f7 1/1 Running 0 1m
po/rook-ceph-operator-86776bbc44-t6lpg 1/1 Running 0 2m
po/rook-discover-929cv 1/1 Running 0 1m
- Create ceph cluster using cluster.yml, only two changes which I made were to change the mon
count
from 3 to 1 and changingdataDirHostPath
from/var/lib/rook
to/data/rook
.
$ kubectl create -f cluster.yml
namespace "rook-ceph" created
serviceaccount "rook-ceph-cluster" created
role "rook-ceph-cluster" created
rolebinding "rook-ceph-cluster-mgmt" created
rolebinding "rook-ceph-cluster" created
cluster "rook-ceph" created
If you are going to run the above command without changing the dataDirHostPath
. It will get failed with following error in minikube. In case of minikube, it’s mandatory to change the path.
2018-07-15 11:04:54.456354 I | exec: Running command: ceph-mon --foreground --name=mon.rook-ceph-mon0 --cluster=rook-ceph --mon-data=/var/lib/rook/rook-ceph-mon0/data --conf=/var/lib/rook/rook-ceph-mon0/rook-ceph.config --keyring=/var/lib/rook/rook-ceph-mon0/keyring --public-addr=10.99.145.45:6790 --public-bind-addr=172.17.0.14:6790
2018-07-15 11:04:54.479849 I | rook-ceph-mon0: 2018-07-15 11:04:54.479521 7f247919bec0 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 17
2018-07-15 11:04:54.479878 I | rook-ceph-mon0: 2018-07-15 11:04:54.479581 7f247919bec0 -1 error: monitor data filesystem reached concerning levels of available storage space (available: -2147483648%!b(MISSING)ytes)
2018-07-15 11:04:54.479882 I | rook-ceph-mon0: you may adjust 'mon data avail crit' to a lower value to make this go away (default: 5%!)(MISSING)
2018-07-15 11:04:54.479885 I | rook-ceph-mon0:
2018-07-15 11:04:54.481396 I | rook-ceph-mon0: 2018-07-15 11:04:54.479581 7f247919bec0 -1 error: monitor data filesystem reached concerning levels of available storage space (available: -2147483648%!b(MISSING)ytes)
2018-07-15 11:04:54.481416 I | rook-ceph-mon0: you may adjust 'mon data avail crit' to a lower value to make this go away (default: 5%!)(MISSING)
2018-07-15 11:04:54.481420 I | rook-ceph-mon0:
failed to run mon. failed to start mon: Failed to complete 'rook-ceph-mon0': exit status 28.
- Check all the resources created in
rook-ceph
namespace. Two deploymentss are created.
$ kubectl get all -n rook-ceph
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/rook-ceph-mgr-a 1 1 1 1 1m
deploy/rook-ceph-osd-id-0 1 1 1 1 1m
NAME DESIRED CURRENT READY AGE
rs/rook-ceph-mgr-a-75cc4ccbf4 1 1 1 1m
rs/rook-ceph-mon0 1 1 1 1m
rs/rook-ceph-osd-id-0-77df9d59c5 1 1 1 1m
NAME DESIRED SUCCESSFUL AGE
jobs/rook-ceph-osd-prepare-minikube 1 1 1m
NAME READY STATUS RESTARTS AGE
po/rook-ceph-mgr-a-75cc4ccbf4-5ns2l 1/1 Running 0 1m
po/rook-ceph-mon0-r48g6 1/1 Running 0 1m
po/rook-ceph-osd-id-0-77df9d59c5-ckhvt 1/1 Running 0 1m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/rook-ceph-mgr ClusterIP 10.108.199.19 <none> 9283/TCP 1m
svc/rook-ceph-mgr-dashboard ClusterIP 10.102.88.239 <none> 7000/TCP 1m
svc/rook-ceph-mon0 ClusterIP 10.108.81.241 <none> 6790/TCP 1m
- To check the health of ceph cluster and for other ceph related troubleshooting commands, start rook toolbox POD in rook-ceph namespace.
$ kubectl create -f toolbox.yml
$ kubectl get all -n rook-ceph
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/rook-ceph-mgr-a 1 1 1 1 2m
deploy/rook-ceph-osd-id-0 1 1 1 1 2m
NAME DESIRED CURRENT READY AGE
rs/rook-ceph-mgr-a-75cc4ccbf4 1 1 1 2m
rs/rook-ceph-mon0 1 1 1 2m
rs/rook-ceph-osd-id-0-77df9d59c5 1 1 1 2m
NAME DESIRED SUCCESSFUL AGE
jobs/rook-ceph-osd-prepare-minikube 1 1 2m
NAME READY STATUS RESTARTS AGE
po/rook-ceph-mgr-a-75cc4ccbf4-5ns2l 1/1 Running 0 2m
po/rook-ceph-mon0-r48g6 1/1 Running 0 2m
po/rook-ceph-osd-id-0-77df9d59c5-ckhvt 1/1 Running 0 2m
po/rook-ceph-tools 1/1 Running 0 35s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/rook-ceph-mgr ClusterIP 10.108.199.19 <none> 9283/TCP 2m
svc/rook-ceph-mgr-dashboard ClusterIP 10.102.88.239 <none> 7000/TCP 2m
svc/rook-ceph-mon0 ClusterIP 10.108.81.241 <none> 6790/TCP 2m
- Check the health of ceph cluster issuing the bash command from toolbox POD. Ceph is in healthy state.
$ kubectl exec -it po/rook-ceph-tools bash
error: invalid resource name "po/rook-ceph-tools": [may not contain '/']
HAM-VIAGGARW-02:CEPH viaggarw$ kubectl -n rook-ceph exec -it rook-ceph-tools bash
bash: warning: setlocale: LC_CTYPE: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_COLLATE: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_TIME: cannot change locale (en_US.UTF-8): No such file or directory
[root@rook-ceph-tools /]# ceph -s
cluster:
id: 7dcd3cd9-e00c-45d9-a6b2-67c2d1d87252
health: HEALTH_OK
services:
mon: 1 daemons, quorum rook-ceph-mon0
mgr: a(active)
osd: 1 osds: 1 up, 1 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage: 4194 MB used, 12298 MB / 16492 MB avail
pgs:
- Creat a ceph storageclass and one ceph pool.
$ kubectl create -f storageclass.yml
pool "replicapool" created
storageclass "rook-ceph-block" created
- Verify that pool is created successfully in ceph.
[root@rook-ceph-tools /]# ceph osd lspools
1 replicapool,
- Create a volume inside the ceph using PVC (Persistent Volume claim).
$ cat helloworld-pvc.yml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: myvolume
annotations:
volume.beta.kubernetes.io/storage-class: "rook-ceph-block"
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
myvolume Bound pvc-8feed0aa-8830-11e8-b365-08002703dcb1 1Gi RWO rook-ceph-block 1m
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-8feed0aa-8830-11e8-b365-08002703dcb1 1Gi RWO Delete Bound default/myvolume rook-ceph-block 0s
- Verify that object is created in ceph pool.
[root@rook-ceph-tools /]# rbd -p replicapool ls
pvc-8feed0aa-8830-11e8-b365-08002703dcb1
[root@rook-ceph-tools /]# rbd -p replicapool info pvc-8feed0aa-8830-11e8-b365-08
rbd image 'pvc-8feed0aa-8830-11e8-b365-08002703dcb1':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.109774b0dc51
format: 2
features: layering
flags:
create_timestamp: Sun Jul 15 13:11:20 2018
- Start the POD which will use the PVC and mount it on
/tmp
. Create some files in the/tmp
filesystem.
$ kubectl exec -it helloworld-7989d67bdb-nz9sc bash
root@helloworld-7989d67bdb-nz9sc:/app# df -Ph
Filesystem Size Used Avail Use% Mounted on
overlay 17G 3.9G 12G 26% /
tmpfs 64M 0 64M 0% /dev
tmpfs 3.1G 0 3.1G 0% /sys/fs/cgroup
/dev/rbd0 976M 2.6M 907M 1% /tmp
/dev/sda1 17G 3.9G 12G 26% /etc/hosts
shm 64M 0 64M 0% /dev/shm
tmpfs 3.1G 12K 3.1G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 3.1G 0 3.1G 0% /proc/scsi
tmpfs 3.1G 0 3.1G 0% /sys/firmware
root@helloworld-7989d67bdb-nz9sc:/tmp# touch testfile{1..5}
root@helloworld-7989d67bdb-nz9sc:/tmp# ls
lost+found testfile1 testfile2 testfile3 testfile4 testfile5
- Delete the deployment which we have created. Start the new deployment and verify that same files are present in the /tmp filesystem which indicates that persistence is working fine.
$ kubectl get all
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/helloworld 1 1 1 1 3m
NAME DESIRED CURRENT READY AGE
rs/helloworld-7989d67bdb 1 1 1 3m
NAME READY STATUS RESTARTS AGE
po/helloworld-7989d67bdb-nz9sc 1/1 Running 0 3m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 1h
$ kubectl delete deploy/helloworld
deployment "helloworld" deleted
$ kubectl create -f helloworld-with-volume.yml
deployment "helloworld" created
$ kubectl exec -it helloworld-7989d67bdb-4kcbl bash
root@helloworld-7989d67bdb-4kcbl:/app# df -Ph
Filesystem Size Used Avail Use% Mounted on
overlay 17G 3.9G 12G 26% /
tmpfs 64M 0 64M 0% /dev
tmpfs 3.1G 0 3.1G 0% /sys/fs/cgroup
/dev/rbd0 976M 2.6M 907M 1% /tmp
/dev/sda1 17G 3.9G 12G 26% /etc/hosts
shm 64M 0 64M 0% /dev/shm
tmpfs 3.1G 12K 3.1G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 3.1G 0 3.1G 0% /proc/scsi
tmpfs 3.1G 0 3.1G 0% /sys/firmware
root@helloworld-7989d67bdb-4kcbl:/app# cd /tmp/
root@helloworld-7989d67bdb-4kcbl:/tmp# ls
lost+found testfile1 testfile2 testfile3 testfile4 testfile5
- Try to start another, I just modify the deployment name ;) but that POD will not start because our volume is of type
ReadWriteOnce
which means it can be mount in RW mode only in one POD hence another POD which is trying to use this volume is stuck in ContainerCreating, and reason can be confirmed from the events inkubectl describe pod/helloworld1-7989d67bdb-dv7qs
command.
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
helloworld-7989d67bdb-4kcbl 1/1 Running 0 6m
helloworld1-7989d67bdb-dv7qs 0/1 ContainerCreating 0 4m
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m default-scheduler Successfully assigned helloworld1-7989d67bdb-dv7qs to minikube
Normal SuccessfulMountVolume 3m kubelet, minikube MountVolume.SetUp succeeded for volume "default-token-w5vbd"
Warning FailedMount 1m kubelet, minikube Unable to mount volumes for pod "helloworld1-7989d67bdb-dv7qs_default(0c6184c1-8832-11e8-b365-08002703dcb1)": timeout expired waiting for volumes to attach or mount for pod "default"/"helloworld1-7989d67bdb-dv7qs". list of unmounted volumes=[myvolume]. list of unattached volumes=[myvolume default-token-w5vbd]
Warning FailedMount 1m (x9 over 3m) kubelet, minikube MountVolume.SetUp failed for volume "pvc-8feed0aa-8830-11e8-b365-08002703dcb1" : mount command failed, status: Failure, reason: Rook: Mount volume failed: failed to attach volume pvc-8feed0aa-8830-11e8-b365-08002703dcb1 for pod default/helloworld1-7989d67bdb-dv7qs. Volume is already attached by pod default/helloworld-7989d67bdb-4kcbl. Status Running
- As soon as you delete the original POD, we can see that POD which was stuck in ContainerCreating state is now started.
$ kubectl delete deploy helloworld
deployment "helloworld" deleted
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
helloworld1-7989d67bdb-dv7qs 1/1 Running 0 11m
Installing operator using hook.
- If you are planning to use helm for operator installatin then after adding the repository you may see two available charts.
$ helm repo add rook-master https://charts.rook.io/master
$ helm search rook
NAME CHART VERSION APP VERSION DESCRIPTION
rook-master/rook v0.7.0-136.gd13bc83 File, Block, and Object Storage Services for yo...
rook-master/rook-ceph v0.7.0-284.g863c10f File, Block, and Object Storage Services for yo...
If you are wondering about the difference in rook and rook-ceph then this is the info which I got from community.
For 0.8 with the mulitple storage backends introduced a split has happened in `master` which moves from "one" Rook operator to multiple Rook operator (e.g. when a chart for CockroachDB has been created, a chart probably named `rook-master/rook-cockroachdb` will/should be available).
`rook-master/rook` will become obsolete with release of `v0.8`